Visualitions - Sales Impact Analysis¶

This phase focuses on visualizing temporal patterns in sales performance, specifically analyzing daily and monthly trends across stores. The goal is to uncover seasonality, peak periods, and operational rhythms that shape store outcomes. Unlike other analytical stages, this phase does not involve correlation studies—instead, it concentrates on trend-based insights that highlight how sales evolve over time, laying the groundwork for more targeted forecasting and strategic planning.

1. Setup & Imports Libraries¶


In [1]:
import time 
In [2]:
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1)  # Simulate processing time
Step 1: Setup and Import Libraries started...
In [3]:
# Data Manipulation & Processing
import math
import numpy as np
import pandas as pd
from pathlib import Path
import scipy.stats as stats
from datetime import datetime
from sklearn.preprocessing import *

# Data Visualization
import seaborn as sbn
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from pandas.plotting import scatter_matrix

# to ensure Plotly works in both Jupyter and HTML export
pio.renderers.default = "notebook+plotly_mimetype"

sbn.set(rc={'figure.figsize':(14,6)})
plt.style.use('seaborn-v0_8')
sbn.set_palette("husl")

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format','{:.2f}'.format)

# Warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
In [4]:
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================
Rossman Store Sales Time Series Analysis - Part 2
============================================================
All libraries imported successfully!
Analysis Date: 2025-08-15 20:49:23
In [5]:
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.

In [6]:
# Start analysis

data_viz_begin = pd.Timestamp.now()

bold_start = '\033[1m'
bold_end = '\033[0m'

print("🔍 Part 2 Started ...")
print(f"🟢 Begin Date: {bold_start}{data_viz_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Part 2 Started ...
🟢 Begin Date: 2025-08-15 20:49:23

Restore the file¶


In [7]:
%store -r df_viz_feat

View or Display Dataset¶

In [8]:
print("\nTrain Data Preview:")
print("\n",df_viz_feat.head())
Train Data Preview:

         store  dayofweek       date  sales  customers  open     promo stateholiday  schoolholiday  day  week month  quarter  year  isweekend  isholiday  isschoolDay
982643   1115          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982640   1112          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982639   1111          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982638   1110          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False
982637   1109          2 2013-01-01      0          0     0  No Promo       Public              1  Tue     1   Jan        1  2013      False       True        False

2. Data Visualization¶


In [9]:
# Step 1: Setup & Imports Libraries
print("Step 2: Data Visualization started...")
time.sleep(1)  # Simulate processing time
Step 2: Data Visualization started...

Box Plots by Time Segment¶

In [11]:
# Box plot by Month
fig1 = px.box(
    df_viz_feat, 
    x='month', 
    y='sales',
    title='Sales Distribution by Month',
    category_orders={'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun','Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']}
)
fig1.update_layout(title_x=0.5, width = 1200,height=500)
fig1.show(config={'displayModeBar': True, 'displaylogo': False})

# Box plot by Day of Week  
fig2 = px.box(
    df_viz_feat, 
    x='day', 
    y='sales',
    title='Sales Distribution by Day of Week',
    category_orders={'day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']}
)
fig2.update_layout(title_x=0.5, width = 1200,height=500)
fig2.show(config={'displayModeBar': True, 'displaylogo': False})

# Box plot by Year
fig3 = px.box(
    df_viz_feat, 
    x='year', 
    y='sales',
    title='Sales Distribution by Year'
)

fig3.update_layout(title_x=0.5, width = 1200,height = 500)
fig3.show(config={'displayModeBar': True, 'displaylogo': False})

# Simple summary statistics
print("Sales Distribution Summary by Category:")
print("=" * 45)

print("\nBy Month:")
monthly_stats = df_viz_feat.groupby('month')['sales'].agg(['mean', 'median', 'std']).round(0)
for month, stats in monthly_stats.iterrows():
    print(f"{month}: Mean=€{stats['mean']:,.0f}, Median=€{stats['median']:,.0f}")

print("\nBy Day:")
daily_stats = df_viz_feat.groupby('day')['sales'].agg(['mean', 'median', 'std']).round(0)
for day, stats in daily_stats.iterrows():
    print(f"{day}: Mean=€{stats['mean']:,.0f}, Median=€{stats['median']:,.0f}")
Sales Distribution Summary by Category:
=============================================

By Month:
Apr: Mean=€5,739, Median=€5,718
Aug: Mean=€5,693, Median=€5,633
Dec: Mean=€6,827, Median=€6,728
Feb: Mean=€5,645, Median=€5,610
Jan: Mean=€5,465, Median=€5,483
Jul: Mean=€6,023, Median=€5,878
Jun: Mean=€5,761, Median=€5,729
Mar: Mean=€5,785, Median=€5,750
May: Mean=€5,490, Median=€5,717
Nov: Mean=€6,008, Median=€6,081
Oct: Mean=€5,537, Median=€5,567
Sep: Mean=€5,570, Median=€5,504

By Day:
Fri: Mean=€6,704, Median=€6,422
Mon: Mean=€7,798, Median=€7,300
Sat: Mean=€5,857, Median=€5,419
Sun: Mean=€203, Median=€0
Thu: Mean=€6,216, Median=€5,995
Tue: Mean=€7,006, Median=€6,463
Wed: Mean=€6,536, Median=€6,115
In [ ]:
print("✅ Data Visualization completed.\n")

In [ ]:
print("✅ Visualization (II) completed successfully!")
print(f"🗓️ Analysis Date: {bold_start}{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
In [ ]:
# End analysis
data_viz_end = pd.Timestamp.now()
duration = data_viz_end - data_viz_begin

# Final summary print
print("\n📋 Data Viz Summary")
print(f"🟢 Begin Date: {bold_start}{data_viz_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"✅ End Date:   {bold_start}{data_viz_end.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"⏱️ Duration:   {bold_start}{str(duration)}{bold_end}")

Project Design Rationale: Notebook Separation¶

To ensure clarity, maintainability, and scalability while adhering to GitHub's file size limitations, each .ipynb notebook should be modularized by task—allowing for streamlined version control, easier collaboration, and more efficient long-term project management.